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Abstract 

Stochastic  analysis  of  random  heterogeneous  media  provides  useful  information  only 
if  realistic  input  models  of  the  material  property  variations  are  used.  These  input 
models  are  often  constructed  from  a  set  of  experimental  samples  of  the  underlying 
random  field.  To  this  end,  the  Karhunen-Loeve  (K-L)  expansion,  also  known  as 
principal  component  analysis  (PC A),  is  the  most  popular  model  reduction  method 
due  to  its  uniform  mean-square  convergence.  However,  it  only  projects  the  samples 
onto  an  optimal  linear  subspace,  which  results  in  an  unreasonable  representation  of 
the  original  data  if  they  are  non-linear ly  related  to  each  other.  In  other  words,  it 
only  preserves  the  second-order  statistics  (covariance)  of  a  random  field,  which  is 
insufficient  for  reproducing  complex  structures.  This  paper  applies  kernel  principal 
component  analysis  (KPCA)  to  construct  a  reduced-order  stochastic  input  model 
for  the  material  property  variation  in  heterogeneous  media.  KPCA  can  be  consid¬ 
ered  as  a  nonlinear  version  of  PC  A.  Through  use  of  kernel  functions,  KPCA  further 
enables  the  preservation  of  high-order  statistics  of  the  random  field,  instead  of  just 
two-point  statistics  as  in  the  standard  Karhunen-Loeve  (K-L)  expansion.  Thus,  this 
method  can  model  non-Gaussian,  non-stationary  random  fields.  In  addition,  poly¬ 
nomial  chaos  (PC)  expansion  is  used  to  represent  the  random  coefficients  in  KPCA 
which  provides  a  parametric  stochastic  input  model.  Thus,  realizations,  which  are 
consistent  statistically  with  the  experimental  data,  can  be  generated  in  an  efficient 
way.  We  showcase  the  methodology  by  constructing  a  low-dimensional  stochastic 
input  model  to  represent  channelized  permeability  in  porous  media. 
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1  Introduction 


Over  the  past  few  decades  there  has  been  considerable  interest  among  the 
scientific  community  in  studying  physical  processes  with  stochastic  inputs. 
These  stochastic  input  conditions  arise  from  uncertainties  in  boundary  and 
initial  conditions  as  well  as  from  inherent  random  material  heterogeneities. 
Material  heterogeneities  are  usually  difficult  to  quantify  since  it  is  physically 
impossible  to  know  the  exact  property  at  every  point  in  the  domain.  In  most 
cases,  only  a  few  statistical  descriptors  of  the  property  variation  or  only  a  set 
of  samples  can  be  experimentally  determined.  This  limited  information  neces¬ 
sitates  viewing  the  property  variation  as  a  random  field  that  satisfies  certain 
statistical  properties/correlations,  which  naturally  results  in  describing  the 
physical  phenomena  using  stochastic  partial  differential  equations  (SPDEs). 

In  the  past  few  years,  several  numerical  methods  have  been  developed  to  solve 
SPDEs,  such  as  Monte  Carlo  (MC)  method,  perturbation  approach  [1,2],  gen¬ 
eralized  polynomial  chaos  expansion  (gPC)  [3-6]  and  stochastic  collocation 
method  [7-13].  However,  implicit  in  all  these  developments  is  the  assumption 
that  the  uncertainties  in  the  SPDEs  have  been  accurately  characterized  as 
random  variables  or  random  fields  through  the  finite-dimensional  noise  as¬ 
sumption  [11].  The  most  common  choice  is  the  Karhunen-Loeve  (K-L)  expan¬ 
sion  [2,3,14,15],  which  is  also  known  as  linear  principal  component  analysis 
or  PCA  [16].  Through  K-L  expansion,  the  random  field  can  be  represented 
as  a  linear  combination  of  the  deterministic  basis  functions  (eigenfunctions) 
and  the  corresponding  uncorrelated  random  variables  (random  coefiicients) . 
The  computation  of  the  K-L  expansion  requires  the  analytic  expression  of  the 
covariance  matrix  of  the  underlying  random  field.  In  addition,  the  probability 
distribution  of  the  random  variables  is  always  assumed  known  a  priori.  These 
two  assumptions  are  obviously  not  feasible  in  realistic  engineering  problems. 
In  most  cases,  only  a  few  experimentally  obtained  realizations  of  the  random 
field  are  available.  Reconstruction  techniques  are  then  applied  to  generate 
samples  of  the  random  field  after  extracting  the  statistical  properties  of  the 
random  field  through  these  limited  experimental  measurements.  These  pro¬ 
cesses  are  quite  expensive  and  numerically  demanding  if  thousands  of  samples 
are  needed.  This  leads  to  the  problem  of  probabilistic  model  identification  or 
stochastic  input  model  generation,  where  the  purpose  is  to  find  a  parametric 
representation  of  the  random  filed  through  only  limited  experimental  data. 

To  this  end,  a  polynomial  chaos  (PC)  representation  of  the  random  field 
through  experimental  data  was  first  proposed  in  [17]  and  improved  in  subse¬ 
quent  papers  [18-23].  This  framework  consists  of  three  steps:  (1)  Computing 
the  covariance  matrix  of  the  data  numerically  using  the  available  experimental 
data;  (2)  Stochastic  reduced-order  modeling  with  a  K-L  expansion  to  obtain 
a  set  of  deterministic  basis  (eigenfunctions)  and  the  corresponding  random 
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expansion  coeiRcients  (called  K-L  projected  random  variables);  (3)  A  poly¬ 
nomial  chaos  representation  of  these  random  coeiRcients  is  constructed  given 
the  realizations  of  these  coeiRcients  which  are  calculated  from  data.  These 
realizations  are  then  used  for  the  estimation  of  the  deterministic  coeiRcients 
in  the  PC  representation,  where  several  methods  have  been  proposed.  In  the 
pioneering  work  [17],  maximum  likelihood  estimation  was  used  to  hnd  the  PC 
representation  of  the  K-L  random  variables.  In  [18],  a  Bayesian  inference  ap¬ 
proach  was  used  instead  to  construct  the  PC  representation  of  the  random 
held.  However,  these  two  works  did  not  take  into  account  the  dependencies 
between  various  components  of  the  K-L  projected  random  variables.  In  [19], 
the  Rosenblatt  transformation  [24]  was  used  to  capture  these  dependencies 
and  maximum  entropy  approach  together  with  Fisher  information  matrix  was 
used  for  the  estimation  of  the  PC  coeiRcients.  In  [20,21,25],  a  non-intrusive 
projection  approach  with  Rosenblatt  transformation  was  developed  for  the 
PC  estimation.  Apart  from  PC  representation,  in  [26],  the  distribution  of  the 
K-L  random  variables  was  assumed  directly  to  be  uniform  within  the  range 
of  the  realizations  of  these  coeiRcients  from  data.  Later,  in  [27],  the  distribu¬ 
tion  of  the  K-L  random  variables  was  still  assumed  to  be  uniform.  However, 
the  range  of  the  uniform  distribution  was  found  through  enforcing  the  sta¬ 
tistical  constraints  of  the  random  Reid  and  solving  the  resulting  optimization 
problems.  In  [28],  the  uncertain  input  parameters  are  modeled  as  independent 
random  variables,  whose  distributions  are  estimated  using  a  diffusion-mixed- 
based  estimator.  Except  the  work  in  [28],  all  the  previous  developments  rely 
heavily  on  the  K-L  expansion  for  the  reduced-order  modeling.  However,  the 
K-L  expansion  has  one  major  drawback.  The  K-L  expansion  based  stochastic 
model  reduction  scheme  constructs  the  closest  linear  subspace  of  the  high¬ 
dimensional  input  space.  In  other  words,  it  only  preserves  the  covariance  of 
the  random  Reid,  and  therefore  is  suitable  for  multi-Gaussian  random  Reids. 
But  most  of  the  random  samples  contain  essential  non-linear  structures,  e.g. 
higher-order  statistics.  This  directly  translates  into  the  fact  that  the  standard 
linear  K-L  expansion  tends  to  over-estimate  the  actual  intrinsic  dimensional¬ 
ity  of  the  underlying  random  Reid.  Hence,  one  needs  to  go  beyond  a  linear 
representation  of  the  high-dimensional  input  space  to  accurately  access  the 
effect  of  its  variability  on  the  output  variables. 

To  resolve  this  issue,  the  authors  in  [29]  proposed  a  nonlinear  model  reduc¬ 
tion  strategy  for  generating  data-driven  stochastic  input  models.  This  method 
is  based  on  the  manifold  learning  method,  where  the  principal  of  multidi¬ 
mensional  scaling  (MDS)  is  utilized  to  map  the  space  of  viable  property 
variations  to  a  low-dimensional  region.  Then  isometric  mapping  from  this 
high-dimensional  space  to  a  low-dimensional,  compact,  connected  set  is  con¬ 
structed  via  preserving  the  geodesic  distance  between  data  using  the  IsoMap 
algorithm  [30].  However,  this  method  has  two  major  issues.  First,  after  dimen¬ 
sionality  reduction,  it  only  gives  us  a  set  of  low-dimensional  points.  It  does 
not  give  us  the  inherent  patterns  (similar  to  the  eigenfunctions  as  in  the  K-L 
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expansion)  in  the  embedded  random  space.  Therefore,  it  cannot  provide  us  a 
mathematical  parametric  input  model  as  in  the  K-L  expansion,  i.e.  we  want 
to  hnd  the  form  y  =  /(^),  where  vector  y  is  a  realization  of  a  discrete  ran¬ 
dom  held,  and  vector  of  dimension  much  smaller  than  the  original  input 
stochastic  dimension,  is  a  set  of  independent  random  variables  with  a  speci- 
hed  distribution.  In  addition,  when  new  experimental  data  becomes  available, 
this  nonlinear  mapping  needs  to  be  recomputed.  Secondly,  the  IsoMap  algo¬ 
rithm  requires  the  computation  of  the  geodesic  distance  matrix  among  data. 
In  general,  this  matrix  may  be  not  well  dehned  for  real  data.  Even  if  it  is  well 
dehned,  the  algorithm  is  computationally  expensive. 

Both  problems  of  the  K-L  expansion  and  the  non-linear  model  reduction  algo¬ 
rithm  in  [29]  can  be  solved  with  kernel  principal  component  analysis  (KPCA), 
which  is  a  nonlinear  form  of  PCA  [31,32].  KPCA  has  proven  to  be  a  powerful 
tool  as  a  nonlinear  feature  extractor  of  classihcation  algorithm  [31],  pattern 
recognization  [33] ,  image-denosing  [34]  and  statistical  shape  analysis  [35] .  The 
basic  idea  is  to  map  the  data  in  the  input  space  to  a  feature  space  F  through 
some  nonlinear  map  $,  and  then  apply  the  linear  PCA  there.  Through  the 
use  of  a  suitably  chosen  kernel  function,  the  data  becomes  more  linearly  re¬ 
lated  in  the  feature  space  F.  In  the  context  of  stochastic  modeling,  there  are 
two  pioneered  works  [36,37].  In  [36],  KPCA  was  used  to  construct  the  prior 
model  of  the  unknown  permeability  field  and  then  gradient-based  algorithm  is 
used  to  solve  the  history-matching  problem.  However,  the  random  expansion 
coefficients  of  the  linear  PCA  in  the  feature  space  are  assumed  i.i.d.  standard 
normal  random  variables.  This  choice  clearly  does  not  capture  the  statical  in¬ 
formation  from  the  data.  In  [37],  KPCA  is  used  in  the  feature  space  F  for  the 
selection  of  a  subset  of  representative  realizations  containing  similar  properties 
to  the  larger  set. 

Motivated  by  all  the  above  mentioned  works,  in  this  paper,  a  stochastic  non¬ 
linear  model  reduction  framework  based  on  KPCA  is  developed.  To  be  specific, 
the  KPCA  is  first  used  to  construct  the  stochastic  reduced-order  model  in  the 
feature  space.  The  random  coefficients  of  the  linear  PCA  are  then  represented 
via  PC  expansion.  Because  the  K-L  expansion  is  performed  in  the  feature 
space,  the  resulting  realizations  he  in  the  feature  space,  and  therefore,  the 
mapping  from  the  feature  space  back  to  the  input  space  is  needed.  This  is 
called  the  “pre-image  problem”  [34,38].  The  pioneering  work  in  solving  the 
pre- image  problem  is  Mika’s  fixed-point  iterative  optimization  algorithm  [34]. 
However,  this  method  suffers  from  numerical  instabilities.  It  is  sensitive  to  the 
initial  guess  and  is  likely  to  get  stuck  in  local  minima.  Therefore,  it  is  not 
suitable  for  our  stochastic  modeling.  It  is  noted  that  this  algorithm  was  also 
used  in  the  work  [36,37]  to  find  the  pre- image.  In  [38],  the  authors  determine 
a  relationship  between  the  distances  in  the  feature  space  and  the  distances 
in  the  input  space.  The  MDS  is  used  to  find  the  inverse  mapping  and  thus 
the  pre-image.  This  method  is  expensive  since  it  involves  a  singular  value 
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decomposition  on  a  matrix  of  nearest  neighbors.  In  this  work,  we  propose  a 
new  approach  to  find  the  pre-image.  It  is  based  on  local  linear  interpolation 
among  n-nearest  neighbors  using  only  the  distances  in  the  input  space,  which 
is  similar  to  the  method  proposed  in  our  earlier  work  [29]. 

This  paper  is  organized  as  follows:  In  the  next  section,  the  mathematical 
framework  of  KPCA  is  considered.  In  Section  3,  the  PC  representation  of  the 
random  coefficients  is  developed.  In  Section  4,  the  new  pre-image  algorithm 
is  introduced.  A  example  with  channelized  permeability  is  given  in  Section  5. 
Finally,  concluding  remarks  are  provided  in  Section  6. 


2  Kernel  principal  component  analysis  of  random  fields 


2.1  Problem  definition 


Let  us  define  a  complete  probability  space  with  sample  space 

which  corresponds  to  the  outcomes  of  some  experiments,  .F  C  2^  is  the  a- 
algebra  of  subsets  in  Q  and  "P  :  .F  — [0, 1]  is  the  probability  measure.  Also, 
let  us  define  D  as  a  two-dimensional  bounded  domain.  Denote  a.{x,u)  the 
random  (property)  field  used  to  describe  and  provide  a  mathematical  model 
of  the  available  experimental  data.  The  random  field  a{x,u)  in  general  be¬ 
longs  to  an  infinite-dimensional  probability  space.  However,  in  most  cases, 
the  random  field  is  associated  with  a  spatial  discretization.  Thus  we  can  have 
a  finite-dimensional  representation  of  the  random  field  which  can  be  repre¬ 
sented/described  as  a  random  vector  y  :=  (yi, . . .  ^Vm)^  '■  ^  M  can 

be  regarded  as  the  number  of  grid  blocks  in  a  discretized  model.  So  each 
yj,  i  =  1, . . . ,  M  is  a  random  variable  which  represents  the  random  property  in 
each  grid  block.  The  dimensionality  of  the  stochastic  model  is  then  the  length 
of  this  vector  M.  Let  y,,  z  =  1, . . . ,  A’  be  N  real  column  vectors  in  i.e. 
Yi  G  E^,  representing  N  independent  realizations  of  the  random  field. 

In  most  cases,  M  would  be  a  large  number.  Our  problem  is  to  find  a  reduced- 
order  polynomial  chaos  representation  of  this  random  field  that  is  consistent 
with  the  data  in  some  statistical  sense.  To  be  specific,  we  want  to  find  a 
form  y  =  /(^),  where  vector  of  dimension  much  smaller  than  the  original 
input  stochastic  dimension  M,  is  a  set  of  independent  random  variables  with  a 
specified  distribution.  Therefore,  by  drawing  samples  ^  in  this  low-dimensional 
space,  we  obtain  different  realizations  of  the  underlying  random  field. 
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2.2  Basic  idea  of  KPC A 


Fig.  1  demonstrates  the  basic  idea  behind  nonlinear  kernel  PCA.  Consider 
a  random  field  y  =  (?/i,|/2)^  G  If  y  is  non-Gaussian,  yi  and  1/2  can  be 
nonlinear ly  related  to  each  other  in  In  this  case,  linear  PCA  or  K-L  ex¬ 
pansion  attempt  to  fit  a  linear  surface  such  that  the  reconstruction  error  is 
minimized  (Fig.  1,  left).  This  clearly  results  in  a  poor  representation  of  the 
original  data.  Now,  consider  a  nonlinear  mapping  $  that  relates  the  input 
space  to  another  space  F 

$  :  E"  ^  F,  y  ^  Y.  (1) 

We  will  refer  F  as  the  feature  space.  In  the  right  figure  of  Fig.  1,  the  realizations 
that  are  nonlinearly  related  in  become  linearly  related  in  the  feature  space 
F.  Linear  PCA  or  K-L  expansion  can  now  performed  in  F  in  order  to  determine 
the  principal  directions  in  this  space. 
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Fig.  1.  Basic  idea  of  KPCA.  Left:  In  this  non-Gaussian  case,  the  linear  PCA  is  not 
able  to  capture  the  nonlinear  relationship  among  the  realizations  in  the  original 
space.  Right:  After  the  nonlinear  mapping  $,  the  realizations  becomes  linearly  re¬ 
lated  in  the  featnre  space  F.  Linear  PCA  or  K-L  expansion  can  now  be  performed 
in  F. 

2.3  PCA  in  feature  space 


In  this  section,  the  theory  of  KPCA  is  briefly  reviewed.  For  a  detailed  intro¬ 
duction,  the  authors  may  refer  to  [31,38]. 

As  shown  before,  kernel  PCA  can  be  considered  to  be  a  generalization  of  linear 
PCA  in  the  feature  space  F.  Thus,  all  results  on  linear  PCA  can  be  readily 
generalized  for  KPCA.  Now,  assume  we  are  given  N  number  of  realizations 
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of  the  random  field  {yi, . . .  ,yjv},  where  each  realization  is  represented  as  a 
high-dimensional  column  vector  y*  G  (e.g.  M  can  be  considered  as  the 
number  of  grid  blocks  in  the  discretization).  The  maps  of  the  realizations  in 
the  feature  space  F  are  'h(yj),  i  =  1, ...  ,N.  Denote  the  mean  of  the  •h-mapped 
data  by  $  =  ^  and  define  the  “centered”  map  $  as 

$  =  $(y)  -  (2) 

Analogous  to  linear  PCA,  we  need  to  find  eigenvectors  V  and  eigenvalues  A 
of  the  covariance  matrix  C  in  the  feature  space,  where 

C  =  Ashy.Wyif-  (3) 

i=l 

The  dimension  of  this  matrix  is  Np  x  Np,  where  Np  is  the  dimension  of  the 
feature  space.  As  explained  in  [31],  Np  could  be  extremely  large.  As  a  result,  it 
will  be  impossible  to  compute  the  C  and  solve  the  eigenvalue  problem  directly. 

Thus,  as  in  [31],  a  kernel  eigenvalue  problem  is  formulated  which  uses  only 
dot  products  of  vectors  in  the  feature  space.  We  first  substitute  the  covariance 
matrix  into  the  eigenvalue  problem  CV  =  AV  and  obtain  [36] 

cv  =  i  i;  (i(y.)  ■  v)  4(7,),  (4) 

i=l 

which  implies  that  all  solutions  V  with  A  7^  0  he  in  the  span  of  $(yi), . . . ,  $(yjv)  [31]. 
Therefore,  we  can  expand  the  solution  V  as  [31] 

V  =  i;a,4(yj),  (5) 

i=i 

and  the  eigenvalue  problem  is  equivalent  to  [31] 

A(l>(y,)  •  V)  =  (l>(y,)  •  CV),  V  ^  =  1, . . . ,  iV.  (6) 

Now,  substituting  Eq.  (5)  into  Eq.  (6),  we  obtain  [31] 

^  Z  ■  ^(yj))  =  iG  £  ( ^(yi)  ■  £  ^(y^) )  (^(y^)  •  ^(yi))  >  (7) 

j=l  j=l  \  k=l  / 

for  i  =  1, . . . ,  N.  Now  define  the  N  x  N  kernel  matrix  K  which  is  the  dot 
product  of  vectors  in  the  feature  space  F: 

K  :  Kt,  =  (4>(y.)  ■  4>(y,)) .  (8) 

Define  the  N  x  N  centering  matrix  H  =  I  —  where  I  is  the  N  x  N 

identity  matrix  and  1  =  [11 . . .  1]^  is  a  V  x  1  vector.  Thus,  the  centered  kernel 
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matrix  K  :  Kij  =  ($(yi)  •  can  be  computed  as 

K  =  HKH.  (9) 

Substituting  Eq.  (9)  into  Eq.  (7),  we  arrive  at  the  following  kernel  eigenvalue 
problem  [31]: 

NXa  =  Ka.  (10) 

In  the  following,  for  simplicity,  we  will  denote  A  as  the  eigenvalues  of  K,  i.e. 
the  solutions  NX  in  Eq.  (10).  We  rewrite  Eq.  (10)  in  the  following  matrix  form: 

AU  =  KU,  (11) 

where,  A  =  diag(Ai, . . . ,  An)  is  the  diagonal  matrix  of  the  corresponding  eigen¬ 
values  and  U  =  [ai, . . . ,  ajv]  with  a*  =  [an, . . . ,  is  the  matrix  contain¬ 

ing  the  eigenvectors. 

The  eigenvectors  need  to  normalized,  which  is  =  1.  Normalizing  the 

solution  Vj  in  F  translates  into  Xi{ai  ■  cxi)  =  1.  From  the  solution  of  the 
eigenvalue  problem,  we  can  write  a*  •  aj  =  1,  since  the  eigenvectors  from  the 
eigenvalue  problem  in  Eq.  (11)  are  normalized.  Therefore,  if  we  divide  cxi  by 
y/Xi,  we  obtain  oti/y/Xi-  oti/y/Xi  =  l/A*. 


Therefore,  through  Eq.  (5),  the  zth  orthornormal  eigenvector  of  the  covariance 
matrix  C  in  the  feature  space  can  be  shown  to  be  [31,38] 


a. 


=  E  Wi$(yi),  where  a^j  = 

j=l  i=i 


(12) 


Let  y  be  a  realization  of  the  random  field,  with  a  mapping  $(y)  in  F.  Then 
$(y)  can  be  decomposed  in  the  following  way: 

N 

^y)  =  J2ziVi  +  ^,  (13) 

i=l 

where  Zi  is  the  projection  coefficient  onto  the  zthe  eigenvector  Vp 

Zi  =  Yi-  d>(y)  =  y  aij  (l>(y)  •  $(yj))  .  (14) 

i=i 


2-4  Computing  dot  products  in  feature  space 


From  Eq.  (9),  it  is  seen  that  in  order  to  compute  the  kernel  matrix,  only  the 
dot  products  of  vectors  in  the  feature  space  F  are  required,  while  the  explicit 


calculation  of  the  map  $(y)  does  not  need  to  be  known.  As  shown  in  [31],  the 
dot  product  can  be  computed  through  the  use  of  the  kernel  function.  This  is 
the  so  called  “kernel  trick” .  Not  all  arbitrary  functions  but  the  Mercer  kernels 
can  the  used  as  a  kernel  function  [31].  The  kernel  function  k{yi,yj)  calculates 
the  dot  product  in  space  F  directly  from  the  vectors  of  the  input  space 

=  (^(Yi) -^(yi))-  (is) 

The  commonly  used  kernel  functions  are  polynomial  kernel  and  Gaussian  ker¬ 
nels. 


2.4-1  Kernel  for  linear  PC  A 

If  the  kernel  function  is  chosen  as  polynomial  kernel  of  order  one 


kiyi,yj)  =  (yi-yj),  (16) 

then  the  linear  PCA  is  actually  performed  on  the  sample  realizations.  It  is 
noted  that  the  use  of  the  kernel  matrix  to  perform  the  linear  PCA  is  actually 
the  same  as  the  “method  of  snapshots”  which  is  well  known  in  the  area  of 
reduced  order  modeling  [39].  This  method  is  more  computationally  efficient 
than  the  standard  implementation  of  the  K-L  expansion  as  in  [17].  Using 
the  kernel  matrix,  only  a  eigenvalue  problem  of  size  A  x  A’  is  needed,  whereas 
in  [17]  the  size  of  the  eigenvalue  problem  is  M  x  M.  In  most  cases,  the  number 
of  available  experimental  data  is  much  smaller  than  the  dimensionality  of  the 
data  itself. 


2-4.2  Kernel  for  nonlinear  PCA 

Choosing  a  nonlinear  kernel  function  leads  to  performing  nonlinear  PCA.  The 
most  common  kernel  function  is  the  Gaussian  kernel: 

k{yi,yj)  =  exp 

where  ||yj  —  y^lp  is  the  squared  L2-distance  between  two  realizations.  The 
kernel  width  parameter  a  controls  the  flexibility  of  the  kernel.  A  larger  value 
of  a  allows  more  “mixing”  between  elements  of  the  realizations,  whereas  a 
smaller  value  of  a  uses  only  a  few  signiflcant  realizations.  As  recommended 
in  [35],  a  typical  choice  for  a  is  the  average  minimum  distance  between  two 
realizations  in  the  input  space: 

1  ^ 

f^^  =  c^5IminjVi||y*-yif>  i  =  l,  ...,A,  (18) 

i=l 

where  c  is  a  user-controlled  parameter. 


9 


2.5  Stochastic  reduced- order  modeling  via  KPCA 


Since  only  the  dot  products  of  vectors  in  the  feature  space  are  available  through 
use  of  the  kernel  function,  we  hrst  rewrite  Eq.  (14)  in  terms  of  the  kernel 
function  [38]: 


JV  ^  N  ^ 

p  =  (^(y)  •  ^(yt))  =  51  (y>  yt)  >  (i9) 

where 

^(y,  Yj)  =  k{Y,  Yj)  -  (20) 

with 

ky  =  [^(y,yi),---,^(y,yiv)]^-  (21) 

We  can  write  Eq.  (19)  in  a  matrix  form: 

Z  =  A'^ky  +  b,  (22) 

where  A  =  Ha  and  b  =  —  ^a^HKl. 

Now  suppose  the  eigenvectors  are  ordered  by  decreasing  eigenvalues  and  we 
can  only  work  in  the  low-dimensional  subspace  which  is  spanned  by  the  hrst 
r  largest  eigenvectors,  where,  in  general,  r  -C  A.  Then  the  decomposition  in 
Eq.  (13)  can  be  truncated  after  the  hrst  r  terms: 

My)  ~  E  +  $  =  E  PMY^)  (23) 

i=l  i=l 

where  /?,  is  the  ith  component  of  the  vector  (3  =  A.7i  +  ^1.  It  is  noted  that 
since  only  the  hrst  r  eigenvectors  are  used,  a  used  in  Eq.  (22)  only  contains 
the  hrst  r  columns  of  the  original  matrix. 

Then  the  stochastic  reduced-order  input  model  in  the  feature  space  can  be 
dehned  as:  for  any  realization  Y  E  F,  we  have 

Y,  =  EA$(yi),  with  /3  =  A^  +  ll.  (24) 

i=i 

Here,  the  subscript  r  is  to  emphasize  that  the  realization  is  reconstructed 
using  only  the  hrst  r  eigenvectors.  ^  :=  [^j, . . . ,  is  a  r-dimensional  random 
vector. 

According  to  the  properties  of  the  K-L  expansion  [3,15],  the  random  vectors 
^  satisfy  the  following  two  conditions: 

-E'Ki]  =  0,  i,j  =  l,...,r.  (25) 
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Therefore,  the  random  coefficients  are  uncorrelated  but  not  independent. 
The  realizations  of  these  random  coefficients  can  be  obtained  through  Eq.  (22) 

=  i  =  (26) 

Our  problem  then  reduces  to  identify  the  random  vector  ^  :=  [^j, . . . , 
given  its  N  samples  i  =  1, ...  ,N.  A  polynomial  chaos 

representation  is  introduced  in  the  next  section. 

Finally,  similarly  to  the  K-L  expansion  [3,15],  the  minimum  mean  squared 
error  (MSE)  of  truncated  expansion  Eq.  (24)  in  the  feature  space  can  be 
shown  to  be 

e(||Y-Y,||^)  f;  A,  (27) 

''''  i=r+l 

Remark  1.  For  linear  PC  A,  we  only  need  to  replace  the  kernel  function  in 
Eq.  (22)  with  the  kernel  function  for  linear  PCA  Eq.  (16)  and  replace  $(yi) 
in  Eq.  (24)  with  the  data  directly. 


3  Polynomial  chaos  representation  of  the  stochastic  reduced  order 
model 


The  problem  is  now  to  identify  a  random  vector  ^  :  0  ^  E’’,  given  a  set  of 
independent  samples  For  this  purpose,  we  use  a  polynomial  chaos 

(PC)  expansion  to  represent  Several  methods  have  been  proposed  to  solve 
the  expansion  coefficients  in  the  resulted  PC  expansion,  such  as  maximum 
likelihood  [17],  Beyesian  inference  [18],  maximum  entropy  method  [19]  and 
non-intrusive  projection  method  [21,25].  As  mentioned  before,  the  components 
of  random  vector  ^  are  uncorrelated  but  not  independent.  Although  Rosenbllat 
transformation  can  be  used  to  reduce  the  problem  to  a  set  of  independent 
random  variables  [20,25],  it  is  computationally  expensive  for  high-dimensional 
problems.  In  this  work,  to  reduce  the  computational  cost,  we  further  assume 
the  independence  between  components  of  This  is  generally  not  the  case  for 
arbitrary  stochastic  processes.  However,  in  the  works  [18]  and  [21],  it  has  been 
numerically  verified  that  this  strong  hypothesis  gives  accurate  results. 


3. 1  PC  expansion  of  random  variables 


The  theory  and  properties  of  the  PC  expansion  have  been  well  documented 
in  various  references  [3-5].  In  this  approach,  any  random  variable  with  fi¬ 
nite  variance  can  be  expanded  in  terms  of  orthogonal  polynomials  of  specific 
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standard  random  variables.  Since  each  is  independent,  it  can  be  separately 
decomposed  onto  an  one- dimensional  PC  basis  of  degree  p: 

6  i  =  l,...,r,  (28) 

i=o 

where  the  rji  are  i.i.d.  random  variables.  The  random  basis  functions 
are  chosen  according  to  the  type  of  random  variable  {rii}  that  has  been  used 
to  describe  the  random  input.  For  example,  if  Gaussian  random  variables  are 
chosen  then  the  Askey  based  orthogonal  polynomials  are  chosen  to  be 

Hermite  polynomials,  if  rji  are  chosen  to  be  uniform  random  variables,  then 
{Tj}  must  be  Legendre  polynomials  [4].  Although  the  Hermite  polynomials 
are  used  in  this  paper,  the  method  developed  can  be  applied  to  generalized 
polynomial  chaos  expansions.  The  Hermite  polynomials  are  given  by 

To(?7i)  =  l, 

'^j+i{Vi)  =  Vi'^j{Vi)  -  if  i  ^  1-  (29) 


The  above  one-dimensional  Hermite  polynomials  are  orthogonal  with  respect 
the  corresponding  probability  density  function  (PDF)  of  the  standard  normal 
random  variable: 

1  /•+°° 

=  —=  /  ^  dri  =  i\  (30) 

V  ZTT  J-OO 

Thus,  the  PC  coefficients  can  be  computed  through  Galerkin  projection: 


E 


»?(>?) 


r+oo 

/  2  dr], 

J  —OO 


i  =  j  =  0,...,p. 

(31) 


A  numerical  integration  is  needed  to  evaluate  this  integral.  However,  it  is 
noted  that  the  random  variable  ^  does  not  belong  to  the  same  stochastic 
space  as  r^,  a  non-linear  mapping  F  :  ry  — >  ^  is  thus  needed  which  preserves  the 
probabilities  such  that  F(r7)  and  ^  have  the  same  distributions.  Here,  a  non- 
intrusive  projection  method  using  empirical  cumulative  distribution  functions 
(CDFs)  of  samples,  which  was  developed  in  [21],  is  utilized  to  compute  the 
PC  coefficients. 


3.2  A  non-intrusive  projection  method  for  calculating  PC  coefficients 


The  non-linear  mapping  P  :  ry  ^  ^  can  be  defined  by  employing  the  Rosenblatt 
transformation  [24]  as  shown  below  for  each 

^,^R(ry,),  R  =  (32) 
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where  and  Fr^  denote  the  CDFs  of  and  r],,  respectively.  Here,  the 
equalities,  “=”  should  be  interpreted  in  the  sense  of  distribution  such  that 
the  PDFs  of  random  variables  on  both  sides  are  equal.  The  statistical  toolbox 
of  MATLAB  provides  functions  to  evaluate  the  CDF  of  many  standard  PC 
random  variables. 

However,  the  marginal  CDF  of  the  samples  is  not  known  and  can  only 
be  evaluated  numerically  from  the  available  data.  Here,  the  kernel  density 
estimation  approach  is  used  to  construct  the  empirical  CDF  of  [40].  Now, 
let  be  N  samples  of  obtained  from  Eq.  (26).  Then  the  marginal 

pdf  of  is  evaluated  as: 


1  ^  1 


(33) 


where  the  bandwidth  r  should  be  chosen  to  balance  smoothness  and  accuracy. 
Then  CDF  of  ^  can  be  obtained  by  integrating  Eq.  (33)  and  then  the  inverse 
CDE  is  computed.  The  MATLAB  function,  ksdensity,  in  statistical  tool  box 
is  used  to  find  the  inverse  CDE  of  where  the  r  is  automatically  computed 
from  the  information  of  data. 


Now  the  expectation  in  the  Galerkin  projection  formulate  Eq.  (31)  is  well 
defined  and  can  be  computed  in  r^-space.  Then,  the  coefficients  of  the  PC 
expansion  can  be  computed  using  a  Gauss-Hermite  quadrature: 

1  /•+00  I  ^9 

/  6^i(h)e““d?7  ^  Ukri{V2iik)'^j{V2i^k),  (34) 

V27rj!2-oo  V^J- k=i 

where  the  {cufc,  f^k}k=i  integration  weights  and  points  of  Gauss-Hermite 
quadrature.  It  is  noted  here  that  a  transformation  r]  =  v^/i  is  used  since 
the  weight  in  the  Gauss-Hermite  quadrature  is  e~^  while  the  PDE  of  Gauss 
random  variable  is  ^=6“*^ 

y  ZTT 


3.3  PC  representation  of  the  random  field 


Now,  putting  both  KPCA  and  PG  expansion  together,  one  arrives  at  the 
following  r-dimensional  PC  representation  of  the  stochastic  random  field  in 
the  feature  space: 


JV  1 

Y,  =  E A$(yz),  with  /3  =  AC  +  -1, 

i=l 


(35) 
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where  the  r  x  1  column  vector  ^  is: 


c 


I]  Jlj'i’jiVl),  7ri^'i(hr) 

J=0  j=0 


(36) 


Therefore,  by  drawing  i.i.d.  samples  of  standard  Gaussian  random  variables 
rii,i  =  1, . . .  ,r,  we  obtain  different  realizations  of  the  underlying  random  field 
in  the  feature  space  F.  Now  the  dimensionality  of  the  stochastic  input  space 
is  successfully  reduced  from  a  large  number  M  to  a  small  number  r. 


4  The  pre-image  problem  in  KPCA 


The  simulated  realizations  of  the  random  field  from  Eq.  (35)  are  in  the  feature 
space  F.  However,  we  are  interested  in  obtaining  realizations  in  the  physical 
input  space.  Therefore,  an  inverse  mapping  is  required  as  y  =  d>“^(Y).  This 
is  known  as  the  pre-image  problem  [34,38].  As  demonstrated  in  [34],  this  pre¬ 
image  may  not  exist  or  if  it  exists,  it  may  be  not  unique.  Therefore,  we  can 
only  settle  for  an  approximate  pre- image  y  such  that  d>(y)  ~  Y^. 


4-1  Fixed-point  iteration  for  finding  the  pre-image 


One  solution  to  the  pre- image  problem  is  to  address  this  problem  as  a  nonlinear 
optimization  problem  by  minimizing  the  squared  distance  in  the  feature  space 
between  d>(y)  and  Y^: 

p(y)  =  ll^(y)  -  (37) 

The  extremum  can  be  obtained  by  setting  VyP  =  0.  For  Gaussian  kernel 
Eq.  (17),  this  nonlinear  optimization  problem  can  be  solved  by  a  fixed-point 
iteration  method  [34]: 


^  _  E^=i  A  exp(-||yt  -y^||V(2(7^))y^ 

E*=i  fii  exp  {-\\yt  -  yi||V(2cT2)) 

As  can  be  easily  seen,  the  pre-image  in  this  case  will  depend  on  the  initial 
starting  point  and  is  likely  to  get  stuck  in  local  minima.  In  addition,  this 
scheme  is  numerically  unstable  and  one  has  to  try  a  number  of  initial  guesses. 
Therefore,  it  is  not  suitable  for  our  stochastic  simulation  since  we  need  to 
have  a  one  to  one  mapping  and  to  find  an  efficient  way  for  generating  a  large 
number  of  samples. 

Furthermore,  it  is  noted  that  the  pre-image  obtained  in  this  scheme  is  in  the 
span  of  all  realizations  y^’s,  i.e.  it  is  a  linear  combination  of  all  the  available 
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realizations: 


where  the  weights 


N 

i=l 


N 

T,f>i  =  h 


i=l 


ftexp(-||y,-y.||V(2<T^)) 

Efcift  exp(-||y, -yi||7(2iT2))' 


(39) 


(40) 


Due  to  the  exponential  term  in  the  Gaussian  kernel,  the  contributions  of  the 
realizations  typically  drop  rapidly  with  increasing  distance  from  the  pre-image. 
In  other  words,  the  influence  of  training  realizations  with  smaller  distance  to  y 
will  tend  to  be  bigger.  Therefore,  it  is  reasonable  to  use  only  nearest  neighbors 
of  the  pre-image  for  local  linear  interpolation. 


4-2  Local  linear  interpolation  for  finding  the  pre-image 


As  illustrated  in  the  last  section,  we  can  find  the  pre-image  using  local  lin¬ 
ear  interpolation  within  n- nearest  neighbors.  Motivated  by  our  previous  work 
in  [29],  the  Euclidean  distances  are  used  as  the  interpolation  weights.  Actually, 
there  exists  a  simple  relationship  between  the  feature-space  and  input-space 
distance  for  Gaussian  kernel  [38,41].  The  basic  idea  of  the  method  is  shown 
in  Fig.  2.  For  an  arbitrary  realization  in  F,  we  first  calculate  its  distances 
to  the  nearest  neighbors  in  the  feature  space.  Then  the  distances  in  the  input 
space  are  recovered.  Finally,  the  input-space  distances  are  used  as  the  local 
linear  interpolation  weights. 


Fig.  2.  Basic  idea  of  the  proposed  pre-image  method. 

Given  any  realization  G  F,  we  can  compute  the  squared  feature  distance 
between  Y^  to  the  Ah  mapped  data  as: 

(Y„$(y,))  =  ||Y,  -  d>(y,)f  =  ||d>(y.)f  +  ||Y,f  -  2Yj$(y,),  (41) 

for  z  =  1,  ...,A’.  Recall  that  for  Gaussian  kernel,  k{yi,yi)  =  1  and  Y^  = 
Then  for  each  feature  distance  df  (Y^,  $(yi)) ,  z  =  1, . . . ,  Y,  we 
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can  compute  them  in  the  following  matrix  form: 

=  1  +  (3^K(3  -  2K/3,  (42) 

where  the  vector  =  [d\, . . .  ,d\[Y ■  We  thus  can  sort  this  vector  in  ascend¬ 
ing  order  to  identify  the  n- nearest  neighbors  with  respect  to  Y^,  $(yj),i  = 

On  the  other  hand,  the  squared  feature  distance  between  the  d>-map  of  the 
pre-image  y  and  the  zth  mapped  data  is: 


rf7($(y),$(y.))  =  ll$(y)-$(y.)f 

=  fc(y,  y)  +  ^(y*,  y*)  -  2A:(y,  y^) 

=  2(1  - /c(y,yi)),  (43) 

for  i  =  1, . . . ,  Y  and  where  we  have  used  /c(y,  y)  =  /^(yi,  yi)  =  1  again  since 
Gaussian  kernel  is  used.  Furthermore,  we  have  the  squared  input-space  dis¬ 
tance: 

\  (  l|y-yiin 

^(y,  y*)  =  exp  I - — —  1 , 

from  which  we  obtain 


d}  =  ||y  -  yif  =  -2(j2log(/i:(y,yi)),  (44) 

for  i  =  1, . . . ,  Y.  Substituting  Eq.  (43)  into  Eq.  (44),  one  arrives  at 

dj  =  ||y  -  yill^  =  -2(T^log(l  -  O.Sd-),  (45) 

for  z  =  1, . . . ,  Y.  Because  we  try  to  hnd  an  approximate  pre-image  such  that 
d>(y)  PS  Yr,  it  is  straightforwardly  to  identify  the  relationship  df  ~  d?  from 
Eqs.  (41)  and  (43).  Therefore,  the  squared  input-distance  between  the  approx¬ 
imate  pre-image  y  and  zth  realization  can  be  computed  by: 

dj  =  ||y  -  yill^  =  -2cT^log(l  -  O.Sd-),  (46) 

for  i  =  1, . . . ,  Y  and  where  d?  is  given  by  Eq.  (42). 

Finally,  the  pre- image  y  for  a  feature  space  realization  Y^  is  given  by 

.  ELi  i-y* 

y  =  ^ 

2^1=1  di 

where  yj,z  =  1, . . .  ,n  are  the  n- nearest  neighbors.  It  is  noted  that  here  we 
use  the  n-nearest  neighbors  in  the  feature  space.  However,  they  are  the  same 
as  the  n-nearest  neighbors  in  the  input  space  since  Eq.  (46)  is  monotonically 
increasing. 
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Therefore,  the  pre- image  y  of  an  arbitrary  realization  in  the  feature  space  is  the 
weighted  sum  of  the  pre-images  of  the  n- nearest  neighbors  of  in  the  feature 
space,  where  the  nearest  neighbors  are  taken  from  the  samples  yi,i  =  1, ...  ,N. 
This  local  linear  interpolation  procedure  is  based  on  the  principle  that  a  small 
region  in  a  highly  curved  manifold  can  be  well  approximated  as  a  linear  patch. 
It  is  easily  verified  that  the  interpolation  weights  satisfies  Eq.  (39).  Thus,  a 
unique  pre-image  can  now  be  obtained  using  simple  algebraic  calculations  in  a 
single  step  (no  iteration  is  required)  and  is  suitable  for  stochastic  simulation. 


5  Numerical  example 


In  this  section,  we  apply  Kernel  PCA  on  modeling  random  permeability  field 
of  complex  geological  channelized  structures.  This  structure  is  characterized 
by  multipoint  statistics  (MPS),  which  expresses  the  joint  variability  at  many 
more  than  two  locations  [42].  Therefore,  only  mean  and  correlation  structure 
(two-point  statistics)  are  not  enough  to  capture  the  underlying  uncertainty  of 
the  random  field  and  thus  linear  PCA  or  standard  K-L  expansion  is  expected 
to  fail. 


5.1  Generation  of  experimental  samples 


In  [17],  the  experimental  data  was  obtained  through  solution  of  stochastic 
inverse  problems  given  several  realizations  of  the  system  outputs.  This  method 
is  computationally  expensive  if  a  large  number  of  samples  is  required.  To  this 
end,  we  utilize  the  reconstruction  technique  from  an  available  training  image 
to  numerically  generate  samples  of  the  random  field  [29] .  The  basic  idea  is  that 
the  training  image  contains  geological  structure  or  continuity  information  of 
the  underlying  random  field,  and  the  MPS  algorithm  creates  realizations  of  the 
random  field  honoring  this  structural  relationship  [36].  In  this  example,  one  of 
the  MPS  algorithms,  the  single  normal  equation  simulation  ‘snesim’  algorithm 
is  used  to  generate  the  channelized  permeability  [42]  from  the  training  image 
shown  in  Fig.  3.  It  is  a  binary  image  where  1  designates  channel  sand  and 
0  designates  background  mud.  Since  we  are  only  interested  in  the  geological 
structure  not  the  value  itself,  a  further  assumption  is  made  such  that  the 
image  is  a  logarithmic  transformation  of  the  original  permeability  field  and 
the  values  1  and  0  are  the  permeability  values  themselves. 

Using  the  ‘snesim’  algorithm,  1000  realizations  of  a  channelized  permeability 
field,  of  dimension  45  x  45,  are  created  from  the  training  image  in  Fig.  3.  These 
serve  as  the  training  samples  (experimental  data)  of  the  random  field  for  the 
construction  of  KPCA.  Additional  set  of  realizations,  which  is  not  included 
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in  the  training  samples,  will  serve  as  the  test  samples  to  verify  the  accuracy 
of  the  constructed  reduced-order  model  using  KPCA  algorithm.  Some  of  the 
realizations  created  are  shown  in  Fig.  4. 


Realization  1 


20  40 

Realization  10 
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Realization  4 


20  40 

Realization  18 


20  40 


Fig.  4.  Some  of  the  realizations  created  using  the  ‘snesim’  algorithm. 


Each  realization  of  the  random  field  can  be  considered  as  a  2025-dimensional 
vector.  Since  each  vector  consists  of  only  0  and  1,  all  the  samples  occupy 
only  corners  of  a  2025-dimensional  hypercube.  Therefore,  they  are  not  linearly 
related  and  linear  PCA  will  fail.  This  problem  is  similar  to  the  image-denoising 
problem  in  machine  learning  community  where  binary  images  of  digits  are 
used  and  the  performance  of  KPCA  is  proved  to  be  superior  to  that  of  linear 
PCA  [34,38]. 
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5.2  Comparison  between  linear  PC  A  and  Kernel  PC  A 


The  kernels  Eqs.  (16)  and  (17)  are  used  now  to  perform  linear  PCA  and  Ker¬ 
nel  PCA,  respectively,  on  the  1000  sample  realizations.  The  parameter  c  in 
Eq.  (18)  is  taken  to  be  10.  The  kernel  matrices  are  formulated  and  subsequent 
eigenvalue  problems  are  solved.  The  corresponding  first  100  eigenvalues  and 
reconstruction  mean  squared  error  (MSE)  are  shown  in  Fig.  5.  For  non-linear 
data  set,  the  failure  of  linear  PCA  means  overestimating  the  intrinsic  dimen¬ 
sionality  of  the  data.  Therefore,  in  order  to  compare  the  performance  from 
both  methods,  we  need  to  keep  the  same  number  of  eigenvectors.  A  rule  of 
thumb  is  to  choose  r  such  that  Z)i=i  A/ is  sufficiently  close  to  one. 
However,  this  rule  is  not  applied  here.  Since  if  the  reconstruction  MSE  is  suffi¬ 
ciently  small,  there  is  no  need  to  keep  so  many  eigenvectors.  To  this  end,  only 
the  largest  30  eigenvalues  are  retained  for  the  KPCA,  which  corresponds  to 
about  75%  energy  of  the  random  field.  The  associated  reconstruction  MSE  is 
0.003,  which  is  small  enough  to  ensure  an  accurate  expansion  in  the  feature 
space.  On  the  same  time,  we  also  keep  30  eigenvalues  for  linear  PCA,  where 
the  reconstruction  MSE  is  107.242,  which  is  much  larger  than  that  of  KPCA. 
Many  more  eigenvalues  are  needed  to  reduce  the  MSE.  This  observation  par¬ 
tially  indicates  the  failure  of  linear  PCA. 


Fig.  5.  Plots  of  the  eigenspectrum  (left  column)  and  MSE  (right  colnmn)  from  linear 
PCA  (top  row)  and  Kernel  PCA  (bottom  row). 

We  next  try  to  reconstruct  one  of  the  test  samples  using  both  methods. 
The  results  are  shown  in  Fig.  6  with  different  number  of  eigenvectors  re¬ 
tained.  By  reconstruction,  we  mean  we  first  project  the  test  samples  onto  the 
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low-dimensional  space  and  compute  their  coordinates.  Then  using  only  the 
low-dimensional  coordinates,  the  original  sample  in  the  input  space  is  recon¬ 
structed  from  linear  PCA  and  Kernel  PCA  pre-image  algorithm,  respectively. 
10  nearest  neighbors  are  used  in  finding  the  pre-image.  It  is  seen  that  using 
linear  PCA,  the  reconstruction  results  improve  slowly  with  increasing  number 
of  eigenvectors.  A  large  number  of  eigenvectors  are  needed  to  obtain  a  good 
result.  This  is  consistent  with  our  previous  discussion  that  the  linear  PCA 
will  overestimate  the  actual  dimensionality  of  the  data  in  the  nonlinear  case. 
However,  the  reconstructed  permeability  value  is  still  not  correct  even  with 
r  =  500.  On  the  other  hand,  only  30  eigenvectors  are  needed  to  get  a  good 
reconstruction  from  KPCA.  Increasing  the  number  of  eigenvectors  will  not  im¬ 
prove  the  results  since  the  nearest  neighbors  have  been  correctly  identified  and 
the  reconstruction  MSE  in  the  feature  space  is  quite  small.  The  first  9  identi¬ 
fied  nearest  neighbors  of  the  test  sample  are  shown  in  Fig.  7  with  r  =  30.  It  is 
clearly  seen  that  the  first  3  neighbors  have  the  geological  structures  which  are 
most  similar  to  our  test  sample.  This  verifies  that  the  introduced  pre-image 
algorithm  indeed  finds  the  correct  nearest  neighbors  among  the  data  using 
only  the  distances  in  the  features  space  and  input  space. 
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Fig.  6.  Reconstruction  of  a  test  sample  (top  figure)  using  linear  PCA  (middle  row) 
and  Kernel  PCA  (bottom  row)  with  different  number  of  eigenvectors  retained. 

To  further  verify  the  Kernel  PCA  algorithm,  more  test  samples  are  recon¬ 
structed.  To  compare  its  performance  with  that  of  linear  PCA,  30  eigenvectors 
are  retained  in  both  cases  since  it  was  shown  before  that  r  =  30  is  enough  for 
Kernel  PCA.  The  results  are  shown  in  Fig.  8.  Again,  Kernel  PCA  consistently 
gives  more  accurate  results  and  the  reconstruction  samples  are  more  like  the 
original  binary  images. 


20 


Neighbor  #1 


20  40 


Neighbor  #2  Neighbor  #3 


1 

0.5 


0 


Neighbor  #5 


20  40 


Neighbor  #6 

20 
40 

20  40 


1 

0.5 


0 


Neighbor  #8  Neighbor  #9 


20  40  20  40 


Fig.  7.  The  first  9  nearest  neighbors  of  the  test  samples  identified  in  the  feature 
space  with  r  =  30. 
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Fig.  8.  More  reconstruction  results  of  different  test  samples  (top  row)  using  linear 
PCA  (middle  row)  and  Kernel  PCA  (bottom  row)  with  r  —  30. 


In  this  section,  we  compared  the  performance  of  linear  PCA  and  Kernel  PCA 
for  reconstruction.  Although  linear  PCA  is  not  optimal,  it  still  more  or  less 
provided  us  the  desired  geological  structure.  However,  what  we  are  interested 
in  is  the  generation  of  stochastic  realizations.  In  the  next  section,  we  will 
show  that  the  linear  PCA  cannot  give  us  arbitrary  realizations  with  expected 
geological  patterns. 
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5.3  PC  representation  and  stochastic  sampling  in  the  low- dimensional  space 


In  this  section,  we  construct  the  PC  representation  of  the  random  field  using 
the  method  introduced  in  Section  3.2.  We  keep  30  eigenvectors  for  both  linear 
PC  A  and  kernel  PC  A.  First,  the  samples  of  ^  are  computed  by  inserting 
the  1000  sample  realizations  y,  into  Eq.  (26).  Then  these  samples  are  used  to 
construct  the  inverse  marginal  CDF  for  each  dimension  through  kernel  density 
estimation.  Finally,  Eq.  (34)  is  used  to  calculate  the  PC  coefficients  for  each 
dimension.  A  PC  representation  of  the  random  channelized  permeability  is 
constructed  next,  which  is  only  of  30  dimensions,  compared  with  the  original 
2025-dimensional  space.  By  drawing  30  i.i.d.  standard  normal  random  vari¬ 
ables  rii  from  this  low-dimensional  space  and  substituting  them  into  Eqs.  (35) 
and  (36),  any  realization  of  the  underlying  random  permeability  field  can  now 
be  obtained  in  an  inexpensive  way. 

Fig.  9  depicts  4  different  marginal  PDFs  of  the  initial  and  identified  random 
variables  using  the  non- intrusive  projection  method  for  Hermite  chaos  with  in¬ 
creasing  expansion  orders  using  linear  PCA.  The  marginal  PDF  of  initial  ran¬ 
dom  variable  is  obtained  by  plotting  the  kernel  density  estimation  of  the  1000 
sample  coefficients  .  The  marginal  PDF  of  the  identified  random  variable  is 
obtained  by  plotting  the  kernel  density  estimation  of  10000  PC  realizations  of 
These  PC  realizations  are  calculated  by  generating  10000  standard  normal 
random  vectors  and  inserting  them  into  Eq.  (28).  It  is  seen  that  a  p  =  10 
order  expansion  is  enough  to  accurately  identify  the  random  coefficients.  It  is 
also  interesting  to  note  that  the  PC  expansion  converges  slowly  for  the  first 
two  random  coefficients  and  ^2-  This  is  because  more  variance  is  contained 
in  the  larger  eigenvectors  due  to  the  property  of  PCA. 

Some  of  the  realizations  generated  through  sampling  the  PC  representation  of 
linear  PCA  are  shown  in  Pig.  10.  The  failure  of  linear  PCA  is  more  pronounced 
in  this  case.  The  realizations  definitely  do  not  reflect  the  original  channelized 
structure  of  the  permeability  field.  In  addition,  the  permeability  value  is  not 
correctly  predicted. 

Similar  results  are  shown  in  Pigs.  11  and  12  for  Kernel  PCA.  However,  using 
Kernel  PCA,  it  is  seen  that  the  generated  realizations  clearly  show  channelized 
geological  structure  with  correct  permeability  values. 

We  introduce  the  relative  errors  for  statistics  of  the  experimental  samples, 
and  PC  realizations,  }£=i : 
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Fig.  9.  Four  different  marginal  PDFs  of  the  initial  and  identified  random  variables 
using  linear  PC  A  with  Hermite  chaos. 
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Fig.  10.  Realizations  of  the  random  field  by  sampling  the  corresponding  PC  repre¬ 
sentation  using  linear  PCA. 

in  which  Tj  represents  the  appropriate  sample  statistics  of  experimental  sam¬ 
ples  in  fth  grid  block  and  represents  the  corresponding  sample 

statistics  of  PC  realizations  10000  PC  realizations  of  the  random 

permeability  are  generated.  The  results  are  shown  in  Table  1.  From  the  table, 
as  expected,  the  linear  PCA  gives  accurate  mean  and  covariance,  which  are 
first-  and  second-order  statistics.  On  the  other  hand,  the  Kernel  PCA  gives 
better  skewness  and  kurtosis,  which  can  be  considered  as  third-  and  four-order 
statistics,  respectively.  It  is  noted  that  relative  error  of  skewness  is  1.006  for 
linear  PCA,  while  it  is  0.114  for  Kernel  PCA.  Therefore,  the  skewness  is  the 
dominant  higher-order  statistic  in  the  case  of  channelized  permeability  field. 
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Fig.  11.  Four  different  marginal  PDFs  of  the  initial  and  identified  random  variables 
using  Kernel  PC  A  with  Hermite  chaos. 


Fig.  12.  Realizations  of  the  random  field  by  sampling  the  corresponding  PC  repre¬ 
sentation  using  Kernel  PCA. 

Finally,  we  want  to  comment  on  the  overall  statistics  of  generated  PC  realiza¬ 
tions.  From  Table  1,  it  is  seen  that  the  errors  from  Kernel  PCA  are  still  big  for 
covariance  and  kurtosis.  However,  it  is  emphasized  that  we  are  most  interested 
in  the  main  geological  structure  of  the  random  field.  We  cannot  obtain  exactly 
the  same  statistics  as  in  the  experimental  samples.  There  are  three  possible 
reasons.  First,  only  an  approximate  pre-image  is  calculated.  This  process  con¬ 
tributes  significant  portion  of  the  total  error.  Second,  a  PC  expansion  is  used 
to  fit  the  marginal  PDF  of  the  random  coefficients.  From  Fig.  11,  it  is  noted 
that  some  experimental  samples  are  from  the  long  tail  region  of  the  marginal 
PDF.  Thus,  the  experimental  samples  can  be  considered  as  the  extreme  real- 
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izations  of  the  underlying  random  fields.  When  we  sample  the  corresponding 
marginal  PDF  from  PC  expansion,  it  is  clear  that  most  of  the  samples  are 
from  the  high  density  region.  Third,  there  are  only  1000  data  samples  used  to 
compute  the  statistics,  which  may  be  not  enough  to  get  converged  statistics. 
Therefore,  the  statistics  of  the  generated  PC  realizations  are  expected  to  be 
slightly  smaller  than  that  of  experimental  sample.  But  the  PC  realizations 
should  capture  the  main  statistical  features  of  the  experimental  samples.  This 
point  is  demonstrated  in  the  next  section. 

Table  1 

Comparison  of  statistics  between  experimental  samples  and  PC  samples  of  the  ran¬ 
dom  permeability  field. 


Errors 

Mean  vector 

Covariance  matrix 

Skewness  vector 

Kurtosis  vector 

Linear  PC  A 

0.039 

0.193 

1.006 

0.676 

Kernel  PCA 

0.102 

0.427 

0.114 

0.496 

5.4  Forward  uncertainty  propagation  with  the  stochastic  input  model 


In  this  section,  the  generated  stochastic  input  model  is  used  as  an  input  to 
the  single- phase  flow  problem  on  the  domain  [0, 1]^.  The  governing  equations 
of  the  single-phase  flow  are  [43] 


V  •  m)  =  0,  u(x,u!)  =  —a{x,u!)'Vp{x,u)  Wx  E  D,  (49) 

dS{x  t, uj)  ^  ^  ^  Vx  e  D,te[0,T].  (50) 

ot 

Flow  is  induced  from  left-to-right  with  Dirichlet  boundary  conditions  p  =  1 
on  {xi  =  0},  p  =  0  on  {xi  =  1}  and  no-flow  homogeneous  Neumann  bound¬ 
ary  conditions  on  the  other  two  edges.  We  also  impose  zero  initial  condition 
for  saturation  S{x,0)  =  0  and  boundary  condition  S{x,t)  =  1  on  the  in¬ 
flow  boundary  {xi  =  0}.  Mixed  finite  element  method  developed  in  [43]  is 
used  to  solve  the  above  equations  with  spatial  discretization  45  x  45.  So  the 
permeability  is  defined  as  a  constant  in  each  grid  block. 

The  stochastic  permeability  is  a  =  exp(y)  if  the  experimental  samples  are 
used  or  a  =  exp(y’^p'')  if  the  PC  realizations  from  the  stochastic  input  model 
are  used  in  the  computation.  Monte  Carlo  (MC)  simulation  is  then  conducted 
using  both  1000  experimental  samples  directly  and  generated  PC  realizations. 
50000  PC  realizations  are  generated  by  sampling  the  low-dimensional  space. 
The  stochastic  input  model  is  the  same  as  the  one  used  to  generate  the  real¬ 
izations  in  the  last  section. 
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(c)  (d) 


Fig.  13.  Contour  of  saturation  at  0.2  PVI:  MC  mean  (a)  and  variance  (b)  from 
experimental  samples;  MC  mean  (c)  and  variance  (d)  from  PC  realizations. 


The  contour  plots  of  saturation  at  0.2  PVI  are  given  in  Fig.  13.  PVI  represents 
dimensionless  time  and  is  computed  as  PVI  =  J  Q  dt/Vp,  where  Vp  is  the  total 
pore  volume  of  the  system,  which  is  equal  to  the  area  of  the  domain  D  here 
and  Q  =  fQ£,out(uh  ■  n)  ds  is  the  total  flow  rate  on  the  out  flow  boundary 
The  water  cut  curves  are  also  given  in  Fig.  14.  The  water-cut  is  de¬ 
fined  as  F{t)  =  d" ■  figures,  it  is  seen  that  we  obtain 

JaDoutiUh-n)  ds 

nearly  the  same  mean  saturation  and  mean  water  cut  curves  from  the  experi¬ 
mental  samples  and  the  PC  realizations.  However,  the  variances  of  saturation 
and  water  curve  curves  are  not  exactly  the  same.  In  general,  the  value  of  the 
variance  from  PC  realizations  is  smaller  than  that  from  the  experimental  sam¬ 
ples.  But  it  should  be  noted  that  the  results  from  the  PC  realizations  capture 
the  same  main  features  as  the  results  from  the  experimental  samples.  To  be 
specific,  the  regions  with  the  highest  variance  values  of  saturation  are  nearly 
the  same  in  both  cases.  The  shapes  of  the  variance  of  the  flow  curve  are  also 
the  same  in  both  cases.  These  observations  are  expected.  As  explained  before, 
the  experimental  samples  can  be  considered  as  the  extreme  realizations  of  the 
stochastic  model.  On  the  other  hand,  most  of  the  PC  realizations  are  from 
the  highest  probability  density  region  of  the  same  stochastic  space.  Together 
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with  the  fact  that  there  are  only  1000  experimental  samples,  it  is  obvious  that 
the  MC  results  from  experimental  data  are  far  from  converged  results.  If  more 
experimental  samples  available,  the  results  are  expected  to  converge  to  the 
results  using  our  low-dimensional  model.  This  is  possibly  expensive  in  practi¬ 
cal  engineering  problems.  On  the  other  hand,  our  proposed  stochastic  input 
model  provides  a  fast  way  to  generate  many  realizations,  which  are  consistent, 
in  a  useful  sense,  with  the  experimental  data. 


Fig.  14.  Comparison  of  water  cut  curves:  (a)  Mean;  (b)  Variance. 

Finally,  we  want  to  comment  on  the  computational  time.  To  generate  the 
1000  experimental  data  using  the  ‘snesim’  algorithm,  it  took  approximate  30 
minutes  on  a  single  processor  machine.  On  the  other  hand,  to  generate  50000 
PC  realizations  from  our  reduced-order  model,  less  than  1  minute  is  needed 
on  the  same  machine.  In  addition,  the  process  of  generating  PC  realizations 
is  easily  parallelized,  which  is  used  in  our  MC  simulations. 


6  Conclusion 


In  this  paper,  a  new  parametric  stochastic  reduced-order  modeling  method 
is  proposed,  which  can  efficiently  preserve  higher-order  statistics  of  the  ran¬ 
dom  held  given  limited  experimental  data.  This  method  relies  on  the  theory 
of  Kernel  PCA  to  transform  the  data  into  a  feature  space,  in  which  the  data 
is  more  linearly  related  than  in  the  original  input  space.  PC  expansion  is 
then  utilized  to  identify  the  random  coefficients  in  the  Kernel  PCA  formula¬ 
tion.  A  simple  but  accurate  pre-image  algorithm  is  also  introduced  to  project 
the  generated  PC  realizations  back  to  the  original  input  space.  A  thorough 
comparison  between  the  Linear  PCA  and  Kernel  PCA  on  reduced-order  mod¬ 
eling  of  the  channelized  permeability  held  is  conducted.  As  expected,  Kernel 
PCA  gives  more  accurate  realizations  which  rehects  the  original  channelized 
geological  structure  with  much  less  eigenvectors  retained.  This  results  in  a 
low-dimensional  stochastic  input  space.  On  the  other  hand,  linear  PCA  fails 
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to  capture  the  channelized  structure  with  the  same  number  of  eigenvectors 
due  to  the  nonlinearity  of  the  data.  Forward  uncertainty  quantification  is  also 
conducted  which  shows  the  introduced  stochastic  input  model  indeed  captures 
the  main  statistical  properties  of  the  underlying  random  field. 

As  a  first  step  towards  the  implementation  of  this  method,  the  independence 
between  the  random  coefficients  was  assumed.  Although  this  gives  us  meaning¬ 
ful  results  in  the  numerical  examples  considered,  the  effect  taking  the  depen¬ 
dence  into  account  as  in  [19]  using  Rosenblatt  transformation  needs  further 
investigation.  In  addition,  the  Euclidean  distances  between  data  is  directly 
used  as  the  distance  measure  between  data.  As  in  [37],  it  will  be  more  in¬ 
teresting  to  take  distances  between  flow  responses  as  the  distance  measure 
between  permeability  data.  This  is  expected  to  be  more  accurate  since  it  in¬ 
corporates  the  information  of  the  underlying  stochastic  system. 

The  basic  model  reduction  ideas  envisioned  in  this  work  are  not  limited  to  gen¬ 
eration  of  viable  stochastic  input  models  for  property  variations.  This  frame¬ 
work  has  direct  applicability  to  inverse  problems  [13],  where  the  generated 
model  can  be  considered  as  the  prior  model  of  all  available  properties.  Thus, 
finding  the  unknown  property  is  only  limited  to  the  low-dimensional  space  and 
is  expected  to  be  more  efficient  than  working  in  the  original  high-dimensional 
space.  Furthermore,  the  generation  of  a  low-dimensional  surrogated  space  pro¬ 
vides  a  prerequisite  in  stochastic  low-dimensional  modeling  of  SPDEs  [44], 
which  may  have  major  ramifications  in  design  under  uncertainty  and  stochas¬ 
tic  optimization  problems.  These  potentially  exciting  areas  of  application  of 
our  proposed  framework  offer  fertile  avenues  of  further  research. 
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